Automatic Morphological Analysis for Russian: a Comparative Study
ثبت نشده
چکیده
In this paper we present a comparison of ten systems for automatic morphological analysis: TreeTagger, TnT, HunPos, Lapos, Citar, Morfette, Mystem, Pymorhy, Stanford POS tagger and SVMTool. Different training and tagging approaches are discussed together with the strengths and weaknesses of each system. Probabilistic taggers were trained and tested on the Russian National Disambiguated Corpus and achieved accuracy scores as high as 96,94% on POS tags and 92,56% on the whole tagset. However, most of the existing taggers cannot resolve various cases of morphological ambiguity and show a better performance for morphologically rich languages. We believe that the detailed examination of errors caused by homonymy can help to solve the disambiguation problem and to improve tagging results.
منابع مشابه
A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources
In this paper, we describe a resource-light system for the automatic morphological analysis and tagging of Russian. We eschew the use of extensive resources (particularly, large annotated corpora and lexicons), exploiting instead (i) pre-existing annotated corpora of Czech; (ii) an unannotated corpus of Russian. We show that our approach has benefits, and present what we believe to be one of th...
متن کاملMultilingual Word Sense Discrimination: A Comparative Cross-Linguistic Study
We describe a study that evaluates an approach to Word Sense Discrimination on three languages with different linguistic structures, English, Hebrew, and Russian. The goal of the study is to determine whether there are significant performance differences for the languages and to identify language-specific problems. The algorithm is tested on semantically ambiguous words using data from Wikipedi...
متن کاملMorphological Analysis of Inflective Languages through Generation
A crucial problem in development of systems for automatic morphological analysis for inflective languages is the treatment of stem alternations. The existing models require development of the corresponding rules that specify what stems can be generated from a given one. Many of such rules (e.g., for Russian about a thousand) do not have any reasonable linguistic interpretation. We suggest a met...
متن کاملA Fault Diagnosis Method for Automaton based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کاملA Fault Diagnosis Method for Automaton Based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016